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We argue in this paper that many common adverbial phrases generally taken to signal 
a discourse relation between syntactically connected units within discourse structure, in- 
stead work anaphorically to contribute relational meaning, with only indirect dependence 
on discourse structure. This allows a simpler discourse structure to provide scaffolding 
for compositional semantics, and reveals multiple ways in which the relational meaning 
conveyed by adverbial connectives can interact with that associated with discourse struc- 
ture. We conclude by sketching out a lexicalised grammar for discourse that facilitates 
discourse interpretation as a product of compositional rules, anaphor resolution and in- 
ference. 



Introduction 



It is a truism that a text means more than the sum of its component sentences. One source 
of additional meaning are relations taken to hold between adjacent sentences "syntacti- 
cally" connected within a larger discourse structure. Howeve r, it has been very difficult to 
say what discourse relations there are, either theo retically ( Mann and Thompson, 1988 ; 
Kehler, 2002 ; Asher and Lascarides, forthcoming ) or empirically ( Knott, 1996 ). 

Knott's empirical attempt to identify and characterise cue phrases as evidence for 
discourse relations illustrates some of the difficulties. Knott used the following theory- 
neutral test to identify cue phrases: For a potential cue phrase 4> in naturally occurring 
text, consider in isolation the clause in which it appears. If the clause appears incomplete 
without an adjacent left context, while it appears complete if 4> is removed, then is a cue 
phrase. Knott's test produced a non-exhaustive list of about 200 different phrases from 
226 pages of text. He then attempted to characterize the discourse relation(s) conveyed by 
each phrase by identifying when (always, sometimes, never) one phrase could substitute 
for another in a way that preserved meaning. He then showed how these substitution 
patterns could be a consequence of a set of semantic features and their values. Roughly 
speaking, one cue phrase could always substitute for another if it had the same set of 
features and values, sometimes do so if it was less specific than the other in terms of its 
feature values, and never do so if their values conflicted for one or more features. 

By assuming that cue phrases contribute meaning in a uniform way, Knott was led 
to a set of surprisingly complex directed acyclic graphs relating cue phrases in terms of 
features and their values, each graph loosely corresponding to some family of discourse 
relations. But what if the relational meaning conveyed by cue phrases could in fact 
interact with discourse meaning in multiple ways? Then Knott's substitution patterns 
among cue phrases may have reflected these complex interactions, as well as the meanings 
of individual cue phrases themselves. 
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Figure 1 

Possible discourse structure for Example |l|. Each root and internal node is labelled by the type 
of relation that Wiebe takes to hold between the daughters of that node, (i) uses an n-ary 
branching sequence relation, while in (ii), sequence is binary branching. 



This paper argues that cue phrases do depend on another mechanism for convey- 
ing extra-sentential meaning - specifically, anaphora. One early hint that adverbial cue 
phrases (called here discourse connectives ) mig ht be anaphoric can be found in an ACL 
workshop paper in which Janyce Wiebe ( 1993| ) used the following example to question 



the adequacy of tree structures for discourse. 

(1) a. The car was finally coming toward him. 

b. He [Chee] finished his diagnostic tests, 

c. feeling relief. 

d. But then the car started to turn right. 

The problem she noted was that the discourse connectives hut and then appear to link 
clause (^) to two different things: "then" to clause (Qb) in a sequence relation - i.e., the 
car starting to turn right being the next relevant event after Chee's finishing his tests 
- and "but" to a grouping of clauses (||a) and (|l|c) - i.e., reporting a contrast between, 
on the one hand, Chee's attitude towards the car coming towards him and his feeling 
of relief and, on the other hand, his seeing the car turning right. (Wiebe doesn't give a 
name to the relation she posits between (|^d) and the grouping of (|^a) and (^js), but it 
appears to be some form of contrast.) 

If these relations are taken to be the basis for discourse structure, some possible 
discourse structures for this example are given in Figure ^ Such structures might seem 
advantageous in allowing the semantics of the example to be computed directly by com- 
positional rules and defeasible inference. However, both structures are directed acyclic 
graphs (DAGs), with acyclicity the only constraint on what nodes can be connected. 
Viewed syntactically, arbitrary DAGS are completely unconstrained systems. They sub- 
stantially complicate interpretive rules for discourse, in order for those rules to account 
for the relative scope of unrelated operators and the contribution of syntactic nodes with 
arbitrarily many parents. Q 

We are not committed to trees as the limiting case of discourse structure. For exam- 
ple, we agree, by and large, with the analysis that Bateman (1999] ) gives of 



1 A reviewer has suggested an alternative analysis of ( l in which clause (jl^) is elaborated by clause 
(h|b) which is in turn elaborated by (pic), and clause ( m) stands in both a sequence relation and a 
contrast relation with the segment as a whole. While this might address Wiebe's problem, the^resul 
is still a DAG, and such a fix will not address the additional examples we present in Section hi 
where a purely structural account still requires DAGs with crossing arcs. 
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(2) ... (vi) The first to do that were the German jewellers, (vii) in particular Klaus 
Burie. (viii) And Morris followed very quickly after, (ix) using a lacquetry tech- 
nique to make the brooch, (x) and using acrylics, (xi) and exploring the use of 
colour, (xii) and colour is another thing that was new at that time . . . 

in which clause (ix) stands in a manner relation with clause (viii) , which in turn stands 
in a succession (i.e., sequence) relation with clause (vi). This is illustrated in Figure ^. 
It is a DAG (rather than a tree), but without crossing dependencies. 

succession manner 



(vi) 

Figure 2 

Simple multi-parent structure 

So it is the cost of moving to arbitrary DAGs for discourse structure that we feel is 
too great to be taken lightly. This is what has led us to look for another explanation for 
these and other examples of apparent complex and crossing dependencies in discourse. 

The position we argue for in this paper, is that while adjacency and explicit con- 
junction (coordinating conjunctions such as "and" , "or" , "so" and "but" ; subordinating 
conjunction such as "although", "whereas", "when", etc.) imply discourse relations be- 
tween (the interpretation of) adjacent or conjoined discourse units, discourse adverbials 
such as "then" , "otherwise" , "nevertheless" and "instead" are anaphors, signalling a re- 
lation between the interpretation of their matrix clause and an entity in or derived from 
the discourse context. This position has four advantages. 

1. Understanding discourse adverbials as anaphors recognises their behavioral 
similarity with the pronouns and definite noun phrases (NPs) that are the 
"bread and butter" of previous work on anaphora. This is discussed in 
Section |l|. 

2. By understanding and exploring the full range of phenomena for which an 
anaphoric account is appropriate, we can better characterise anaphors and 
devise more accurate algorithms for resolving them. This is explored in 
Section ||. 

3. Any theory of discourse must still provide an account of how a sequence of 
adjacent discourse units (clauses, sentences, and the larger units that they can 
comprise) means more than just the sum of its component units. This is a goal 
that researchers have been pursuing for some time, using both compositional 
rules and defeasible inference to determine these additional aspects of 



meaning. ( 


Asher and Lascarides, 1999|; Gardent, 1997; 


Hobbs et al, 1993; 


Kehler, 2002; 


Polanyi and van den Berg, 1996; ^cha and Polanyi, 1988 




Bchilder, 1997a; ^childer, 1997b|; van den Berg, 19961) 


By factoring out that 



portion of discourse semantics that can be handled by mechanisms already 
needed for resolving other forms of anaphora and deixis, there is less need to 
stretch and possibly distort compositional rules and defeasible inference to 
handle everything.F] Moreover, recognising the possibility of two separate 



2 There is an analogous situation at the sentence level, where the relationship between syntactic 
structure and compositional semantics is simplified by factoring away inter-sentential anaphoric 
relations. Here the factorisation is so obvious that one does not even think about any other 
possibility. 
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relations (one derived anaphorically and one associated with adjacency and/or 
a structural connective) admits additional richness to discourse semantics. 
Both points are discussed further in Section |[ 

4. Understanding discourse adverbials as anaphors allows us to see more clearly 
how a lexicalised approach to the computation of clausal syntax and semantics 
extends naturally to the computation of discourse syntax and semantics, 
providing a single syntactic and semantic matrix with which to associate 
speaker intentions and other aspects of pragmatics. (Section ^ 

The account we provide here is meant to be compatible with current approaches to 



discourse semantics s uch as DRT (Kamp and Reyle, 1993 ; van Eijck and Kamp, 1997 ), 
Dynamic Semantics (^tokhof and Groencndijk, 1999 ), and even SDRT (Asher, 1993 



Asher and Lascarides, forthcoming) - understood as a representational scheme rather 



than an interpretive mechanism. It is also meant to be compatible with more detailed 



analyses of the meaning and use of individual discourse adverbials, such as (Jayez and 



Rossari, 1998a; Jayez and Rossari, 1998b; Traugott, 1995; Traugott, 1997). It provides 



what we believe to be a more coherent account of how discourse meaning is computed, 
rather than an alternative account of what that meaning is or what speaker intentions it 
is being used to achieve. 

1 Discourse Adverbials as Anaphors 

1.1 Discourse Adverbials do not behave like Structural Connectives 

We take the building blocks of the most basic level of discourse structure to be explicit 
structural connectives between adjacent discourse units (i.e., coordinating and subordi- 
nating conjunctions, and "paired" conjunctions such as "not only ... but also", "on the 
one hand ... on the other (hand)", etc.) and inferred relations between adjacent discourse 
units (in the absense of an explicit structural connective). Here, adjacency is what triggers 
the inference. Consider the following example: 

(3) You shouldn't trust John. He never returns what he borrows. 

Adjacency leads the hearer to hypothesize that a discourse relation of something like ex- 
planation holds between the two clauses. Placing the subordinate conjunction (structural 
connective) "because" between the two clauses provides more evidence for this relation. 
Our goal in this section is to convince the reader that many discourse adverbials - in- 
cluding "then" , "also" , "otherwise" , "nevertheless" , "instead" - do not behave in this 
way. 

Structural connectives and discourse adverbials do have one thing in common: Like 
verbs, they can both be seen as heading a predicate-argument construction; unlike verbs, 
their arguments are independent clauses. For example, both the subordinate conjunction 
"after" and the adverbial "then" (in its temporal sense) can be seen as binary predicates 
(e.g., sequence) whose arguments are clausally-derived events, with the earlier event in 
first position and the succeeding event in second. 

But that is the only thing that discourse adverbials and structural connectives have 



in common. As we have pointed out in earlier papers (Webber, Knott, and Joshi, 2001 
Webber ct al., 1999a; Webber ct al., 1999b| ), structural connectives have two relevant 



properties: (1) they admit stretching of predicate-argument dependencies; and (2) they 
do not admit crossing of those dependencies. This is most obvious in the case of preposed 
subordinate conjunctions (Example ^ or "paired" coordinate conjunctions (Example^). 
With such connectives, the initial predicate signals that its two arguments will follow. 

(4) Although John is generous, he is hard to find. 
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concession[although] contrast[one/other] 




Figure 3 

Discourse structures associated with (i) Example m and (ii) Example 



(5) On the one hand, Fred likes beans. On the other hand, he's allergic to them. 

Like verbs, structural connectives allow the distance between the predicate and its argu- 
ments to be "stretched" over embedded material, without loss of the dependency between 
them. For the verb "like" and an object argument "apples" , such stretching without loss 



of dependency is illustrated in Example 6b. 

(6) a. Apples John likes. 

b. Apples Bill thinks he heard Fred say John likes. 

That this also happens with structural connectives and their arguments, is illustrated 
in Example |^ (in which the first clause of Example ^ is elaborated by another preposed 
subordinate-main clause construction embedded within it) and Example ^ (in which the 
first conjunct of Example ^is elaborated by another paired-conjunction construction em- 
bedded within it). Possible discourse structures for these examples are given in Figure ^. 



(7) a. Although John is very generous - 

b. if you need some money, 

c. you only have to ask him for it - 

d. he's very hard to find. 

(8) a. On the one hand, Fred likes beans. 

b. Not only does he eat them for dinner. 

c. But he also eats them for breakfast and snacks. 

d. On the other hand, he's allergic to them. 

But, as already noted, structural connectives do not admit crossing of predicate- argument 
dependencies. If we do this with Examples and ||, we get 

(9) a. Although John is very generous - 

b. if you need some money - 

c. he's very hard to find - 

d. you only have to ask him for it. 

(10) a. On the one hand, Fred likes beans. 

b. Not only does he eat them for dinner. 

c. On the other hand, he's allergic to them. 

d. But he also eats them for breakfast and snacks. 

Possible discourse structures for these (impossible) discourses are given in Figure |^. Even 
if the reader finds no problem with these crossed versions, they clearly do not mean the 
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elaboration elaboration 




a bed a bed 

(i) (ii) 

Figure 4 

(Impossible) discourse structures that would have to be associated with Example ^ (i) and 
with Example nC] (ii). 
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same thing as their uncrossed counterparts: In (|T^), "but" now appears to Unk (|l^d) with 
(p^), conveying that despite being allergic to beans, Fred eats them for breakfast and 
snacks. And while this might be inferred from it is certainly not conveyed directly. 
As a consequence, we stipulate that structural connectives do not admit crossing of their 
predicate-argument dependencies]^ 

That is not all. Since we take the basic level of discourse structure to be a consequence 
of (a) relations associated with explicit structural connectives and (b) relations whose 
defeasible inference is triggered by adjacency, we stipulate that discourse structure itself 
does not admit crossing structural dependencies. (In this sense, discourse structure may 
be truly simpler than sentence structure. To verify this, one might examine the discourse 
structure of languages such as Dutch that allow crossing dependencies in sentence-level 
syntax. Initial cursory examination docs not give any evidence of crossing dependencies 
in Dutch discourse.) 

If we now consider the corresponding properties of discourse adverbials, we see that 
they do admit crossing of predicate-argument dependencies, as shown in Examples PtHi3| . 



(11) a. John loves Barolo. 

b. So he ordered three cases of the '97. 

c. But he had to cancel the order 

d. because then he discovered he was broke. 

(12) a. High heels are fine for going to the theater. 

b. But wear comfortable shoes 

c. if instead you plan to go to the zoo. 

(13) a. Because Fred is ill 

b. you will have to stay home 

c. whereas otherwise the two of you could have gone to the zoo. 

Consider first the discourse adverbial "then" in clause (pT|d). For it to get its first 
argument from (p]b) - i.e., the event that the discovery in (d) is "after", it must cross 
the structural connection between clauses (c) and (d) associated with "because". This 
crossing dependency is illustrated in Figure Now consider the discourse adverbial 
"instead" in clause (|l2|c). For it to get its first argument from ([l^a) - i.e., going to the zoo 
is an alternative to going to the theater - it must cross the structural connection between 
clauses (p^ ) and ( [T^c) associated with "if" . This crossing dependency is illustrated in 
Figure |5|ii. Example [fS] is its mirror image: For the discourse adverbial "otherwise" in 
(|l3|c) to get its first argument from (^3|a) - i.e., alternatives to the state/condition of 
Fred being ill - it must cross the structural connection associated with "because" . This 
is illustrated in Figure ^ii. 

Crossing dependencies are not unusual in discourse when one considers anaphora 
(e.g., pronouns and definite NPs), as for example in: 

(14) Every man^ tells every woman-,- he^ meets that she^ reminds him^ of his^ mother. 

(15) SuCi drives an Alfa Romeo. She^ drives too fast. Maryj races her^ on weekends. 
Shcj often beats her^. ( ^trube, 1998 ) 



3 A reviewer has asked how much "stretching" is possible in discourse without losing its thread or 
having to rephrase later material in light of the intervening material. One could ask a similar 
question about the apparently unbounded dependencies of sentence-level syntax, which inattentive 
speakers are prone to lose track of and "fracture" . Neither question seems answerable on theoretical 
grounds alone, demanding substantial amounts of empirical data from both written and spoken 
discourse. The point we are trying to make is simply that there is a difference in discourse between 
any amount of stretching and even the smallest amount of crossing. 
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contrast[but] 




Figure 5 

Discourse structures for Examples pA|-p^. Structural dependencies are indicated by solid lines 
and dependencies associate with discourse adverbials are indicated by dashed lines. 
[explanation' is the inverse of explanation - i.e., with its arguments in reverse order. Such 
relations are used to maintain the given linear order of clauses.) 
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This suggests that in Examples |ll|-|13|, the relationship between the discourse adverbial 
and its (initial) argument from the previous discourse might usefully be taken to be 
anaphoric as well.^ 

1.2 Discourse Adverbials do behave like Anaphors 

There is additional evidence to suggest that "otherwise", "then" and other discourse 
adverbials are anaphors. First, anaphors in the form of definite and demonstrative NPs 
can take implicit material as their referents. For example, in 

(16) Stack five blocks on top of one another. Now close your eyes and try knocking 
{the tower, this tower} over with your nose. 

both NPs refer to the structure which is the impli cit result of the block stacking. (Fur- 



ther discussion of such examples can be found in (Isard, 1975; Dale, 1992; Webber and 



Baldwin, 1992 ).) The same is true of discourse adverbials. In 

(17) Do you want an apple? Otherwise you can have a pear. 

the situation in which you can have a pear is one in which you don't want an apple - i.e., 
where your answer to the question is "no" . But this answer isn't there structurally: it is 
only inferred. While it appears natural to resolve an anaphor to an inferred entity, it would 
be much more difficult to establish such links through purely structural connections: to do 
so would require complex transformations that introduce invisible elements into discourse 
syntax with no deeper motivation. For example, in (p^, we would need a rule that takes 
a discourse unit consisting solely of a yes/no question P7 and replaces it with a complex 
segment consisting of P7 and the clause it is possible that P, with the two related by 
something like elaboration. Then and only then could we account for the interpretation 
of the subsequent otherwise structurally, by a syntactic link to the covert material (i.e., 
to the possibility that P holds, which otherwise introduces an alterative to). 

Secondly, discourse adverbials have a wider range of options with respect to their 
initial argument than do structural connectives (i.e., coordinating and subordinating con- 
junctions). The latter are constrained to linking a discourse unit on the right frontier of 
the evolving discourse (i.e., the clause, sentence and larger discourse units to its immedi- 
ate left). Discourse adverbials are not so constrained. To see this, consider the following 
example: 

(18) If the light is red, stop. Otherwise you'll get a ticket. 

(// you do something other than stop, you'll get a ticket.) 

This can be paraphrased using the conjunction "or" 

If the light is red, stop, or you'll get a ticket. 

Here "or" links its right argument to a unit on the right frontier of the evolving discourse 
- in this case, the clause "stop" on its immediate left. Now consider the related example 



4 We are aware that "crossing" examples such as (hl|)-(h3) are rare in naturally— occurring discourse. 
We believe that this is because they are only possible when, as here, strong constraints from the 
discourse adverbial and from context prevent the adverbial from relating to the closest (leftmost) 
eventuality or an eventuality coerced from that one. But rarity doesn't necessarily mean 
ill-formedness or marginality, as readers can see for themselves if they use Google to search the web 
for strings such as "because then" , "if instead" , "whereas otherwise" , etc. and consider (a) whether 
the hundreds, even thousands, of texts in which these strings occur are ill-formed, and (b) what 
"then" , "instead" and "otherwise" are relating in these texts. One must look at rare events if one is 
studying complex linguistic phenomena in detail. Thus it is not the case that only common things 
in language are real or worth further study. 
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(19) If the light is red, stop. Otherwise go straight on. 
(// the light is not red, go straight on.) 

This cannot be paraphrased with "or" , as in 

(20) If the hght is red, stop, or go straight on. 

even though both "stop" and "if the hght is red, stop" are on the right frontier of the 
evolving discourse structure. This is because "otherwise" is accessing something else, so 
that (20) means something quite different from either (^8|) or (|l9|). What "otherwise" is 
accessing, which "or" cannot, is the interpretation of the condition alone.^ Thus discourse 
adverbials, like other anaphors, have access to material that is not available to structural 
connectives. 

Finally, discourse adverbials, like other anaphors, may require semantic representa- 
tions in which their arguments are bound variables ranging over discourse entities. That 
is, while it might be possible to represent "Although P, Q" using a binary modal operator 



(21) although(p, q) 

where formulas p and q translate the sentences P and Q that "although" combines, we 
cannot represent "P ... Nevertheless, Q" this way. We need something more like 

(22) p A nevertheless (e, q) 

The motivation for the variable e in this representation is that discourse adverbials, 
like pronouns, can appear intra-sententially in an analogue of donkey sentences. Donkey 
sentences such as Example ^ are a special kind of bound- variable reading. 

(23) Every farmer who owns a donkey feeds it rutabagas. 

In donkey sentences, anaphors are interpreted as co- varying with their antecedents: the it 
that is being fed in ( p3[ ) varies with the farmer who feeds it. However, these anaphors ap- 
pear in a structural and interpretive environment in which a direct syntactic relationship 
between anaphor and antecedent is normally impossible, so cannot be a reflex of true 
binding in the syntax-semantics interface. Rather, donkey sentences show that discourse 
semantics has to provide variables to translate pronouns, and that discourse mechanisms 
must interpret these variables as bound — even though the pronouns appear "free" by 
syntactic criteria. 

Thus, it is significant that discourse adverbials can appear in their own version of 
donkey sentences, as in 

(24) a. Anyone who has developed innovative new software, has then had to hire 

a laywer to protect his/her interests, (i.e., after developing innovative new 
software) 

b. Several people who have developed innovative new software, have nevertheless 
failed to profit from it. (i.e., despite having developed innovative new software) 

c. Every person selling "The Big Issue" might otherwise be asking for spare 
change, (i.e., if s/he weren't selling "The Big Issue") 

The examples in ( p^ involve binding in the interpretation of discourse adverbials. In 
(p^) , the temporal use of then locates each hiring event after the corresponding software- 
development. Likewise in (p^), the adversative use of nevertheless signals each devel- 
oper's eye-opener in failing to turn the corresponding profit. And in ([24|:), otherwise 
envisions each person begging if that person weren't selling "The Big Issue". 



5 This was independently pointed out by several people when this work was presented at ESSLLI'Ol 
in Helsinki, August 2001. The authors would like to thank Natalia Modjeska, Lauri Karttunen, 
Mark Steedman, Robin Cooper and David Traum for bringing it to their attention. 
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Such bound interpretations require variables in the semantic representations and 
alternative values for them in some model - hence the representation given in ( p^ . Indeed, 
it is clear that the binding here has to be the discourse kind, not the syntactic kind - for 
the same reason as in (|2^), although we cannot imagine anyone arguing otherwise, since 
discourse adverbials have always been treated as elements of discourse interpretation. 
So the variables must be the discourse variables usually used to translate other kinds of 
discourse anaphors.^ 

These arguments have been directed at the behavioral similarity between discourse 
adverbials and what we normally take to be discourse anaphors. But this isn't the only 
reason to recognise them as anaphors: In the next section, we suggest a framework for 
anaphora that is sufficiently broad enough to include discourse adverbials as well as 
definite and demonstrative pronouns and NPs, and ot her discourse phenomena th at have 
been argued t o be anaphor ic, such as VP ellipsis (Hardt, 1992 ; Kehler, 2002 ), tense 



anc Hardt, 199£) 



(partee, 1984|; [Webber, 1988]) and modality ([Kibble, 1995| ; [Frank and Kamp, 1997[ ; [Stone 



2 A EVamework for Anaphora 

Here we show how only a single extension to a general framework for discourse anaphora 



is needed to cover discourse adverbials. The general framework is presented in Section 2.1 
and the extension in Section 2.2 



2.1 Discourse referents and anaphor interpretation 

The simplest discourse anaphors are coreferential ~ definite pronouns and definite NPs 
that denote one (or more) discourse referents in focus within the current discourse con- 
text. (Under coreference we include split reference, where a plural anaphor such as "the 
companies" denotes all the separately mentioned companies in focus within the discourse 
context.) Much has been written about the factors affecting what discourse refe rents are 
taken to be in focu s. For a recent review by Andrew Kehler, see Chapter 18 of ( Jurafsky 
and Martin, 2000). For the e ffect of different types of quantifiers on discourse referents 
and focus, see ( Kibble, 1995 ) 



Somewhat more complex than coreference is indirect anaphora (Hellman and Frau- 
rud, 1996 ) (also ca lled partial anaphora (Luperfoy, 1992), textual ellipsis (Hahn, Markert, 
and Strube, 199(: ), associative anaphora ( Cosse, 1996), bridging anaphora (Clark, 1975 



Clark and Marshall, 1981 ; Not, Tovcna, and Zancanaro, 1999 ), and inferrables ( Prince, 
1992 )), where the anaphor - usually a definite NP - denotes a discourse referent as 
sociated with one (or more) discourse referents in the current discourse context - e.g. 



(25) Myra darted to a phone and picked up the receiver. 

Here the receiver denotes the receiver associated with (by virtue of being part of) the 
already-mentioned phone Myra darted to. 

Coreference and indirect anaphora can be uniformly modelled by saying that the 
discourse referent Ca denoted by an anaphoric expression a is either equal to or associated 
with an existing discourse referent ~ that is, Ca—Cr or Ca Sassoc(er). 



6 While Rhetorical Structure Theory (RST) ( Mann and Thompson, 198S ) was developed as an 
account of the relation between adjacent units within a text, Marcu's guide to RST annotation 
( Marcu, J^9 ) has added an "embedded" version of each RST relation in order to handle examples 
sucli as (^i4[). While this importantly recognises that material in an embedded clause can bear a 
semantic relation to its matrix clause, it does not contribute to understanding the nature of the 
phenomenon. 
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But coreference and associative anaphora do not exhaust the space of constructs that 
derive all or part of their sense from the discourse context and are thus anaphoric. Con- 



sider "other NPs" ( [Bierner, 2001a| ; [Bierncr and Webber, 200q ; [Modjeska, 2001| ; [Modjeska, 
200|), as in: 

(26) Sue grabbed one phone, as Tom darted to the other phone. 

While "other NPs" are clearly anaphoric, should the referent of "the other phone" (cq.) 
- the phone other than the one Sue grabbed (e^) - be simply considered a case of 
Ca Gassoc(er)? Here are two reasons why not. 

First, in all cases of associative anaphora discussed in the literature, possible associ- 
ations have depended only on the antecedent er and not on the anaphor. For example, 
only antecedents that have parts participate in whole-part associations (e.g. phone re- 
ceiver). Only antecedents with functional schemata participate in schema-based associa- 



tions (e.g. lock key). In (26), the relationship between e^, the referent of "the other 
phone", and its antecedent, e^, depends in part on the anaphor, and not just on the 
antecedent - in particular, on the presence of the word "other" . Secondly, we also have 
examples such as 

(27) Sue lifted the receiver as Tom darted to the other phone^ 

where the referent of "the other phone" (cq) is the phone other than the phone associated 
with the receiver that Sue lifted. Together, these two points argue for a third possibility, 
in which an anaphoric element can convey a specific function that is idiosyncratic 
to the anaphor, which may be applied to either or an associate of e^. The result of 
that application is Cq.. For want of a better name, we will call these lexically- specified 
anaphors. 

Other lexically-specified anaphors include noun phrases headed by "other" (Exam- 
ple ^8|), NPs with "such" but no post-modifying "as" phrase (Example p9|) , comparative 
NPs with no post-modifying "than" phrase (Example pO), and the pronoun "elsewhere" 



(Example p|) (Bierner, 2001b) 



(28) Some dogs are constantly on the move. Others lie around until you call them. 

(29) I saw a 2kg lobster in the fish store yesterday. The fishmonger said it takes about 
5 years to grow to such a size. 

(30) Terriers are very nervous. Larger dogs tend to have calmer dispositions. 

(31) I don't like sitting in this room. Can we move elsewhere? 

To summarize the situation with anaphors so far, we have coreference when e^—er, 
indirect anaphora when Bq. €:assoc(er), and lexically- specified anaphora when ea=fa{Gi) 
where 6^=6^ or Gassoc(er). 

2.2 Discourse Adverbials as Lexical Anaphors 

There is nothing in this generalised approach to discourse anaphora that requires that the 
source of be an NP, or that anaphor be a pronoun or NP. For example, the antecedent Cr 
of a singular demonstrative pronoun (in English, "this" or "that" ) is often an eventuality 



that derives from a clause, a sentence, or a larger unit in the recent discourse (Asher, 



1993; Byron, 2002; Eckert and Strube, 2000; Webber, 1991). We will see that this is the 



case with discourse adverbials as well. 

The extension we make to the general framework presented above in order to include 
discourse adverbials as discourse anaphors, is to allow more general functions to be 



7 Modjeska ( [2001 ) discovered such examples in the British National Corpus. 
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associated with lexically-specified anaphors. In particular, for the discourse adverbials 
considered in this paper, the function associated with an adverbial maps its anaphoric 
argument - an eventuality derived from the current discourse context - to a function that 
applies to the interpretation of the adverbial's matrix clause (itself an eventuality). The 
result is a binary relation that holds between the two eventualities and is added to the 
discourse context. For example, in 

(32) John loves Barolo. So he ordered three cases of the '97. But he had to cancel the 
order because he then discovered he was broke. 

"then", roughly speaking, contributes the fact that its matrix clause event (John finding 
he was broke) is after the anaphorically-derived event of his ordering the wine. I Similarly, 
in 

(33) John didn't have enough money to buy a mango. Instead, he bought a guava. 

"instead" contributes the fact that its matrix clause event (buying a guava) is as an 
alternative to the anaphorically derived event of buying a mango. The relation between 
the two sentences is something like result, as in "So instead, he bought a guava." 

Note that our only concern here is with the compositional and anaphoric mechanisms 
by which adverbials contribute meaning. For detailed analysis of their lexical semantics 



(but no attention to mechanism), the reader is referred to (Jayez and Rossari, 1998a 
[Jayez and Rossari, 1998b ; Lagerwerf, 1998; Traugott, 1995; Traugott, 1997) and others. 



Formally, we represent the function that a discourse adverbial a contributes, as a 
A-expression involving a binary relation that is idiosyncratic to a, one of whose 
arguments (represented here by the variable EV) is resolved anaphorically: 

Xx . Ra{x,EV) 

Ra gets its other argument compositionally, when this A-expression is applied to a's 
matrix clause S interpreted as an eventuality a - that is, 

[\x . Ra{x, EV)]a = Ra{a, EV) 

The result of both function application and resolving EV to some eventuality et derived 
from the discourse context either directly or by association, is the proposition Ra{(T, e^), 
one of whose arguments (e^) has been supplied by the discourse context and the other 
(cr) has been supplied compositionally from syntax. 

Note that this is a formal model, meant to have no implications for how processing 
takes place. Our view of processing is that it is triggered by the discourse adverbial 
and its matrix clause. Given a and cr, the resolution process finds an eventuality (or 
creates an appropriate one by a bridging inference, as illustrated in the next section) 
such that Ra{(J,ei) makes sense with respect to the discourse. This is best seen as a 



constraint satisfaction problem similar to that of resolving a discourse deictic (Asher, 
199|; [Byron, 200^ ; [Eckert and Strubc, 2000| ; [Webber, 199l| ). That is, the process involves 



finding or deriving an eventuality from the current discourse context, that meets the 
constraints of the adverbial with respect to the eventuality interpretation of the matrix 
clause. (Examples of this are given throughout the rest of the paper.) 



8 Words and phrases that function as discourse adverbials usually have other roles as well - e.g., 
"otherwise" also serves as an adjectival modifier, in "I was otherwise occupied with grading exams". 
This overloading of closed-class lexico-syntactic items is not unusual in English, and any 
ambiguities that arise must be handled as part of the normal ambiguity resolution process. 



13 



Computational Linguistics 



Volume 16, Number 1 



2.3 A Logical Form for Eventualities 

Before using this generalised view of anaphora to show what discourse adverbials con- 
tribute to discourse and how they interact with discourse relations that arise from ad- 
jancency or explicit discourse connectives, we briefly describe how we represent clausal 
interpretations in logical form (LF). 

Essentially, we follow ( iHobbs, 1985| ) in using a rich ontology and a representation 
scheme that makes explicit all the individuals and abstract objects (i.e., propositions. 



facts/beliefs and eventualities) (Asher, 1993) involved in the logical form (LF) interpre- 



tation of an utterance. We do so because we want to make intuitions about individuals, 
eventualities, lexical meaning and anaphora as clear as possible. But certainly, other 
forms of representation are possible. 

In this LF representation scheme, each clause and each relation between clauses is in- 
dexed by the label of its associated abstract object. So, for example, the LF interpretation 
of the sentence 

(34) John left because Mary left, 
would be written 

ei:left(j) A john(j) A e2:lcft(m) A mary(m) A e3:because(ei,e2) 

where the first argument of the asymmetric binary predicate because is the consequent 
and the second is the eventuality leading to this consequent. Thus when "because" occurs 
sentence-medially, as in the above example, the eventuality arguments are in the same 
order as their corresponding clauses occur in the text. When "because" occurs sentence- 
initially (as in "Because Mary left, John did"), the interpretation of the second clause 
("John [left]) will appear as the first argument and the interpretation of the first clause 
( "Mary left" ) will appear as the second. [] 

The set of available discourse referents includes both individuals like j and m, but 
also abstract objects like ei and 62- We then represent resolved anaphors by re-using these 
discourse referents. So, for example, the LF interpretation of the follow-on sentence 

(35) This upset Sue. 
would be written 

e4:upset'(DPR0, s) A sue'(s) 

where DPRO is the anaphoric variable contributed by the demonstrative pronoun "this" . 
Since the subject of "upset" could be either the eventuality of John's leaving or the fact 
that he left because Mary left, DPRO could be resolved to either ei or 63 - i.e, 

a. e4:upset'(ei, s) A sue'(s) 

b. e4:upset'(e3, s) A sue'(s) 

depending on whether one took Sue to have been upset by (a) John's leaving or (b) that 
he left because Mary left. 



9 We are not claiming to give a detailed semantics of discourse connectives except insofar as they may 
affect flow discourse adverbials are resolved. Thus, for example, we are not bothering to distinguish 
between different senses of "because" (epistemic vs. non-epistemic), "while" (temporal vs. 
concessive), "since" (temporal vs. causal), etc. Of course, these distinctions are important to 
discourse interpr etation, hut t hey are independent of and orthogonal to the points made in this 
paper. Similarly, |Asher (199^ ) argues that a simple ontology of eventualities is too coarse-grained, 
and that discourse representations need to distinguish different kinds of abstract objects, including 
actions, propositions and facts as well as eventualities. Different discourse connectives will require 
different kinds of abstract objects as arguments. This distinction is also orthogonal to the points 
made in this paper, because we can understand these abstract referents to be associates of the 



corresponding Hobbsii 
discourse connectives. 



n eventualiti es and leave the appropriate choice to the lexical semantics of 



Byron (2002) advocates a similar approach to resolving discourse anaphora. 
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2.4 The Contribution of Discourse Adverbials to Discourse Semantics 

Here we step through some examples of discourse adverbials and how they make their 
semantic contribution to the discourse context. We start with Example ^ repeated here 
as @. 

(36) a. John loves Barolo. 

b. So he ordered three cases of the '97. 

c. But he had to cancel the order 

d. because he then discovered he was broke. 



Using the above LF representation scheme and our notation from Section 2.2, namely 



• a — the anaphoric expression (here, the discourse adverbial) 

• Ra — the relation name linked with a 

• S — the matrix clause/sentence containing a 

• a = the interpretation of S as an abstract object 

and ignoring, for now, the conjunction "because" (to be discussed in Section the 
relevant elements of (p6|d) can be represented as: 

a = then 
Ra = after 

S' = he [John] discovered he was broke 
a = e4:find(j,e5), where e5:broke(j) 

This means that the unresolved interpretation of (|36|d) is 

[Ax . Ra{x,EV)]a = [Ax . after{x,EV)]e4 = after{ei,EV) 

The anaphoric argument EV is resolved to the eventuality 62, derived from (p6|b) - 
e2:order(j, ci). 

after{ei,EV) after{ei,e2) 

That is, the eventuality of John finding he was broke is after that of John ordering three 
cases of the '97 Barolo. The resulting proposition a/ter(e4,e2) would be given its own 
index, eg, and added to the discourse context. 

When "then" it understood temporally, as it is above, as opposed to logically, it 
requires a culminated eventuality from the discourse context as its first argument (which 



(Vendler, 1967) calls an achievement or an accomplishment). The ordering event in (|3y) 
is such an Vendlerian accomplishment. In Example |3^ though, there is no culminated 
eventuality in the discourse context for "then" to take as its first argument. 

(37) a. Go west on Lancaster Avenue, 
b. Then turn right on County Line. 

How does (^b) get its interpretation? 

As with (3^d), the relevant elements of (373) can be represented as 



a — then 
Ra = after 

S = turn right on County Line 

a — e3:turn-right(j;ou, county Jine) 



and the unresolved interpretation of (p7p) is thus 
[A X . a/ter(x, EV)]e3 = after{e3, EV) 
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As for resolving EV, in a well-known paper, Moens and Steedman (1988| ) discuss sev- 
eral ways in which an eventuality of one type (e.g., a process) can be coerced into an 
eventuality of another type (e.g., an accomplishment, which Moens and Steedman call 
a culminated process). In this case, the matrix argument of "then" (the eventuality of 
"turning right on County Line") can be used to coerce the process eventuality in (|37|b) 
into a culminated process of "going west on Lancaster Avenue until County Line" . We 
treat this coercion as a type of associative or bridging inference, as in the examples 
discussed in Section That is. 



62 = culmination(ei)£assoc(ei), where ei:go-west(you, lancasterjive) 
Taking this 62 as the anaphoric argument EV of "then" yields the proposition 
after{e3, 62) 



That is, the eventuality of turning right onto City Line is after that of going west on 
Lancaster Avenue to City Line. This proposition would be indexed and added to the 
discourse context. 

It is important to stress here that the level of representation we are concerned with 
is essentially a logical form (LF) for discourse. Any reasoning that might then have to 
be done on their content might then require making explicit the different modal and 
temporal contexts involved, their accessibility relations, the status of abstract objects as 
facts, propositions or eventualities, etc. But as our goal here is primarily to capture the 
mechanism in which discourse adverbials are involved in discourse structure and discourse 
semantics, we will continue to assume for as long as possible that a LF representation 
will suffice. 

Now it may appear as if there is no difference between treating adverbials as anaphors 
and treating them as structural connectives, especially in cases like ( |37| ) where the an- 
tecedent comes from the immediately left-adjacent context, and where the only obvious 
semantic relation between the adjacent sentences appears to be the one expressed by the 
discourse adverbial. (Of course, there may also be a separate intentional relation between 



the two sentences (Moore and Pollack, 1992), independent of the relation conveyed by 



the discourse adverbial.) 

One must distinguish, however, between whether a theory allows a distinction to be 
made and whether that distinction needs to be made in a particular case. It is clear that 
there are many examples where the two approaches (i.e., a purely structural treatment 
of all connectives, versus one that treats adverbials as linking into the discourse context 
anaphorically) appear to make the same prediction. However, we have already demon- 
strated cases where a purely structural account makes the wrong prediction, and in the 
next section, we will demonstrate the additional power of an account that allows for two 
relations between an adverbial's matrix clause or sentence and the previous discourse 
- one arising from the anaphoric connection and the other inferred from adjacency or 
conveyed explicitly by a structural connective. 

Before closing this section, we want to step through Examples p^|-pO|, repeated here 
as Examples ||-^. 

(38) If the light is red, stop. Otherwise you'll get a ticket. 

(39) If the light is red, stop. Otherwise go straight on. 

Roughly speaking, "otherwise" conveys that the complement of its anaphorically-derived 
argument serves as the condition under which the interpretation of its structural ar- 
gument holds. (This complement must be with respect to some contextually relevant 
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If we represent a conditional relation between two eventualities with the asymmetric 
relation 1/(61,62), where ei is derivved from the antecedent and 62, from the consequent, 
and we approximate a single contextually relevant alternative 62 to an eventuality 61 
using a symmetric complement relation, complement{ei , 62) - then we can represent the 
interpretation of "otherwise" as 

A X . if(VE, x), where complement{V E , EV) 

where variable EV is resolved anaphorically to an eventuality in the current discourse 
context that admits a complement. That is, "otherwise" requires a contextually relevant 
complement to its antecedent and asserts that if that complement holds, the argument 
to the A-expression will. The resulting A-expression applies to the interpretation of the 
matrix clause of "otherwise", resulting in the conditional being added to the discourse 
context: 

[Ax . iJ{VE,-x)] a = if{VE,cr), where complement{V E,EV) 

Here the relevant elements of (|3^) and (|39|b) can be represented as 

a = otherwise 
Ra = if 



S3, 
a is 
SI 
a jc 



= you get a ticket 

= 63, where 63:get_ticket(you) 

— go straight on 

— 63', where e3':go_straight(you) 



The unresolved interpretations of (|3^b) and (|39|b) are thus: 



[Ax . z/(yig,x)] 63 
[Ax . i/(l/£3c,x)] 63' 



*/(^-Ep8|i63), where complement(V 
*/(^^%^7e3'), where complement{V 




As we showed in Section 1.2, different ways of resolving the anaphoric argument lead 
to different interpretations. In (^), the anaphoric argument is resolved to 62:stop(you), 
while in (^9|), it is resolved to ei:red(lightl). Thus the resulting interpretations of (|3^) 
and (|39|b) are, respectively 

2/(64,63), where complement{e2,ei) and e2:stop(you) 

{If you do something other than stop, you'll get a ticket.) 

if{e4', 63), where complement{ei,e4') and ei:red(light) 
(// the light is not red, go straight on.) 



10 KruijfT-Korbayova and Webber (2001k) demonstrate that the Information Structure of sentences in 
the prpviniis Hisro iirsp Itheme-rhnme pa.rtitinning, as well as focus within theme and within rheme 
( pteedman, 2000a )) can influence what eventualities are available for resolving the anaphorically 
derived argument of "otherwise". This then correctly predicts different interpretations for 
"otherwise" in (i) and (ii): 

(i) Q: How should I transport the dog? 

A: You should CARRY the dog. Otherwise you might get HURT. 

(ii) Q. What should I carry? 

A. You should carry THE DOG. Otherwise you might get HURT. 

In both (i) and (ii), the questions constrain the theme/rheme partition of the answer. Small capitals 
represent focus within the rheme. In (i), the "otherwise" clause will be interpreted as warning the 
hearer (H) that H might get hurt if s/he transports the dog in some way other than carrying it 
(e.g., H might get tangled up in its lead). In (ii), the "otherwise" clause warns H that s/he might 
get hurt if what she is carrying is not the dog (e.g., H might be walking past fanatical members of 
the Royal Kennel Club). 
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We have not been specific about how the anaphoric argument of "otherwise" (or of 
any other discourse adverbial) is resolved, other than having it treated as a constraint 
satisfaction problem. This is the subject of current and future work, exploring the empir- 
ical properties of resolution algorithms with data drawn from appropriately annotated 
corpora and from psycholinguistic studies of human discourse interpretation. To this end, 
[Creswell et al. (2002| ) report on a preliminary annotation study of discourse adverbials 
and the location and type of their antecedents. This initial effort involves nine discourse 
adverbials - three each from the classes of concessive, result and reinforcing (additive) 
conjuncts given in (Quirk et al., 1972). Meanwhile, Vcnditti ct al. (2002| ) present a pre- 



liminary report on the use of a constraint-satisfaction model of interpretation, crucially 
combining anaphoric and structural reasoning about discourse relations, to predict sub- 
jects' on-line interpretation of discourses involving stressed pronouns. In addition, two 
proposals have recently been submitted to construct a larger and more extensively anno- 
tated corpus, covering more adverbials, based on what we have learned from this initial 
effort. This more extensive study would be an adequate basis for developing resolution 
algorithms .[^ 

2.5 Summary 

In this section, we have presented a general framework for anaphora with the following 
features: 

• Anaphors can access either one or more discourse referents or entities 
associated with them through bridging inferences. These are sufficient for 
interpreting anaphoric pronouns, definite NPs and demonstrative NPs, allowing 
entities to be evoked by NPs or by clauses. In the case of clauses, this may be 
on an "as needed" basis, as in ( pckert and Strube, 2000| ). 



• A type of anaphor a that we call lexically- specified can also contribute 
additional meaning through a function fa that is idiosyncratic to a, that can 
be applied to either an existing discourse referent or an entity associated with 
it through a bridging inference. In the case of the premodifier "other" , fa 
applied to its argument produces contextually-relevant alternatives to that 
argument. In the case of the premodifier "such", it yields a set of entities that 
are similar to its argument in a contextually-relevant way. 

• Discourse adverbials are lexically-specified anaphors whose meaning function fa 
is a A-expression involving a binary relation Ra that is idiosyncratic to a, one 
of whose arguments is resolved anaphorically and the other is provided 
compositionally, when the A-expression is applied to a's matrix clause 
interpreted as an eventuality a. 



11 With respect to how many discourse adverbials there arc, iQulrk et al. (1972| ) discuss 60 

conjunctions and discourse adverbials under the overall heading "time relations" and 123 under the 
overall heading "conjuncts". The same entries appear under several headings, so that the total 



number of conjunctions and discourse 
enumeration of discourse adverbials, 
sentence- level adverbials in the Penn 
which draw part of their meanin 



adverbials they present is closer to 160. In another 



Forbes and Webber (2002) starts with all annotations of 
I'reeBank, 



and then filters them systematically to determine 
from the preceding discourse and how they do so. What we 
understand from both these studies is that there are fewer than 200 adverbials to be considered, 
many of which are minor variations of each other - "in contrast" , "by contrast" , "by way of 
contrast" , "in comparison" , "by comparison, "by way of comparison" - that are unlikely to differ in 
their anaphoric properties, and some of which, such as "contrariwise", "hitherto" and "to cap it 
all" , will occur only rarely in a corpus of modern English. 
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In the next section, we move on to consider how the presence of both a semantic rela- 
tion associated with a discourse adverbial and a semantic relation associated with the 
adjacency of two clauses or a structural connective between them, allows for interesting 
interactions between the two. 

3 Patterns of Anaphoric Relations and Structural/Inferred Relations 

Prior to the current work, researchers have treated both explicit structural connectives 
(coordinating and subordinating conjunctions, and "paired" conjunctions) and discourse 
adverbials simply as evidence for a particular structural relation holding between adjacent 



units. For example, Kehler (2002) takes "but" as evidence of a conirasi relation between 
adjacent units, "in general" as evidence of a generalization relation, "in other words" as 
evidence of a elaboration relation, "therefore" as evidence of a resu/t relation, "because" as 
evidence of a explanation relation, and "even though" as evidence of a denial of preventer 



relation (Kchlcr, 2002, Chapter 2.1). Here Kehler has probably correctly identified the 
type of relation that holds between elements, but not which elements it holds between. 

In one respect, we follow previous researchers, in that we accept that when clauses, 
sentences or larger discourse units are placed adjacent to one another, listeners infer 
a relation between the two, and that structural connective (coordinate or subordinate 
conjunction) gives evidence for the relation that is intended to hold between them. 

However, because we take discourse adverbials to contribute meaning through an 
anaphoric connection with the previous discourse, this means that there may be two 
relations on offer, and opens the possibility that the relation contributed by the discourse 
adverbial can interact in more than one way with the relation conveyed by a structural 
connective or inferred through adjacency. Below we show that this prediction is correct. 

We start from the idea that - in the absence of an explicit structural connective - 
defeasible inference correlates with structural attachment of adjacent discourse segments 
in discourse structure, relating their interpretations. The most basic relation is that the 
following segment in some way describes the same object or eventuality as the one it 
abuts (elaboration). But evidence in the segments can lead (via defeasible inference) to 
a more specific relation, such as one of the resemblence relations (e.g., parallel^ contrast^ 
exemplification^ generalisation)^ or cause-effect relations [result, explanation, violated ex- 



pectation), or contiguity lelaXioTis [narration) described in (Hobbs, 1990; Kchlcr, 2002). If 
nothing more specific can be inferred, the relation will remain simply elaboration. What 
explicit structural connectives can do is convey relations that are not easy to convey 
by defeasible inference (e.g., "if", conveying condition, and "or", conveying disjunction) 
or provide non-defeasible evidence for an inferrable relation (e.g., "yet", "so" and "be- 
cause"). 

Discourse adverbials can mteract with structural connectives, with adjacency-triggered 
defeasible inference and with each other. To describe the ways in which we have so far 
observed discourse adverbials to interact with relations conveyed structurally, we extend 
the notation used in the previous section: 

• a = discourse adverbial; 

• Ra — the name of the relation associated with a. 

• S — the matrix clause/sentence of a; 

• a ~ the logical form (LF) interpretation of S; 
adding the following: 
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I D = the discourse unit that is left-adjacent to S, to which a relationship holds 
either by inference or a structural connective; 



• 5 = the LF interpretation of Z3; 

• i? = the name of the relation that holds with 5: 



While 5 is one argument of i?, we show below that its other argument may be one of at 
least two different abstract objects. 

Case 1: a separately serves as an argument to both Ra and R. This is the case 
that holds in Example ^ (repeated below). 

6|) a. John loves Barolo. 

b. So he ordered three cases of the '97. 

c. But he had to cancel the order 

d. because he then discovered he was broke. 

We have already seen that the interpretation of the clause in (|36|d) following "because" 
involves: 

Ra = after 

a = e4:discover(j,e5), where e5:broke(j) 
[Ax . after{^,EV)]ei = after{ei,EV) 

where EV is resolved to e2:order(j, ci), and the proposition after{eA, 62) is added to the 
discourse context - i.e., John's discovering he was broke is after his ordering the wine. 

Now consider the explanation relation R associated with "because" in (^6|d). It relates 
64, John's finding he was broke, to the intepretation of (^6|c), e3:cancel(j,oi) - that is, 
explanation{ei,ez)- Clause thus adds both explanation{e4,e3) and after{e4,e2) to the 
discourse. While these two propositions share an argument (64), they are nevertheless 
distinct Q 

Case 2: Ra{a,ei) is an argument of R. In Case 1, it is the interpretation of the 
adverbial's matrix clause a that serves as one argument to the discourse relation R. In 
contrast, in Case 2, that argument is filled by the relation contributed by the discourse 
adverbial (itself an abstract object available for subsequent reference). In both cases, the 
other argument to R is 6. 

One configuration in which Case 2 holds is with the discourse adverbial "otherwise" . 



Recall from Section 2.4 that the interpretation of "otherwise" involves a conditional 
relation between the complement of its anaphoric argument and the interpretation a of 
its matrix clause: 

[Ax . if{VE,x)] a EE if{VE,cr), where comp\ement{V E ,EV) 

With variable EV resolved to an eventuality in the discourse context, it is the resulting 
relation (viewed as an abstract object) that serves as one argument to R, with S serving 
as the other. We can see this most clearly by considering variants of examples (^) and 
( p9| ) that contain an explicit connective between the clauses. In (|38|), the conjunction 
"because" is made explicit (Example^), while in (p9|), the connective is simply "and" 
or "but" (Example 



exp 

m 



12 Because eventuality 64 "John's finding he was broke" both explains the cancelhng and follows the 
ordering, it follows that the cancelling is after the ordering. 
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(40) If the light is red, stop, because otherwise you'U get a ticket. 

= e3:get_ticket(you) 

(41) If the Ught is red, stop, and/but otherwise go straight on. 
Ra = if 

ctQ = e3':goj3traight(you) 
In the case of (^), resolving "otherwise" contributes the relation 

^6- */(e4,e3), where complement{e4,e2) and e2:stop(you) 
{If you do something other than stop, you'll get a ticket.) 

At the level of logical form (LF), the abstract object eg that is associated with the 
conditional relation serves as one argument to the explanation relation contributed by 
"because", with 62 being the other. That is, "because" and "otherwise" together end up 
contributing explanation{e2,ee) (i.e., your needing to stop is explained by the fact that 
if you do something other than stop, you'll get a ticket). 

In the case of (^), resolving "otherwise" contributes the relation 

e6':i/(e4', 63/), where complement{e4' ,ei) and ei:red(light) 
(// the light is not red, go straight on.) 

What is the discourse relation to which "otherwise" contributes this abstract object 
eg'? Whether the connective is "and" or "but", both its conjuncts describe (elaborate) 
alternative specializations of the same situation bq introduced earlier in the discourse 
(e.g., eo could be associated with the first sentence of "Go another mile and you'll get to 
a bridge. If the light is red, stop. Otherwise go straight on.") If the connective is "and", 
what is added to context might simply be elaboration{ee' ,eo) . (N.B. Without "otherwise", 
the relation elaboration{e^ ^e^) would have been added to context, where 65 is the abstract 
object associated with the interpretation of "If the light is red, stop" .) If the connective 
is "but", then one might also possibly add contrastie^i ,e^) - i.e., The situation that [if 
the light is red] you should stop is in contrast with the situation that if the light is not 
red, you should go straight on.^ 

As is clear from the original pair of examples (^8|) and ( ^9|) , similar interpretations can 
arise through adjacency-triggered inference as arise with an explicit connective. In either 
case, the above treatment demonstrates that there is no need for a separate otherwise 



relation, as proposed in Rhetorical Structure Theory (Mann and Thompson, 1988). We 
are not, however, entirely clear at this point when Case 1 holds and when Case 2 does. 
A more careful analysis is clearly required. 

Case 3: Ra is parasitic on R. Case 3 appears to hold with discourse adverbials such 
as "for example" and "for instance" . Their interpretation appears to be parasitic on the 
relation associated with a structural connective or discourse adverbial to their left, or 
on an inferred relation triggered by adjacency. The way to understand this is to first 
consider intra-clausal "for example", where it follows the verb, as in 

(42) Q. What does this box contain? 

A. It contains, for example, some hematite. 



13 A much fiTii^r-graini^rl trnatmr'nt nf thn snmantir.s nf "nthprw isp" in terms of context-update 

potential is given in (KruijfT-Korbayova and Webber, 2001b). Here we are just concerned with its 
interaction with structural connectives and adjacency-triggered relations. 
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The interpretation of "for example" here involves abstracting the meaning of its matrix 
structure with respect to the material to its right, and then making an assertion with 
respect to this abstraction. That is, if the LF contributed by the matrix clause of (|4^A) 
is, roughly, 

i. contain(boxl,hematitel) 

then the LF resulting from the addition of "for example" can be written either with set 
notation (as in ii), taking an entity to exemplify a set, or with A-notation (as in iii), 
taking an entity to exemplify a property: 

ii. ea;empZi/?/(hematitel, {X | contain(boxl,X)}) 
iii. exemplif y(hematitel, AX . contain(boxl,X)) 

Both express the fact that "hematite" is an example of what is contained in the box.[^ 
Since one can derive (i) logically from either (ii) or (iii), one might choose to retain 
only (ii) or (iii) and derive (i) if and when it is needed. In the remainder of the paper, 
we use the A notation given in (iii). Notice that from the perspective of compositional 
semantics, "for example" resembles a quantifier, in that the scope of its interpretation 
is not isomorphic to its syntactic position. Thus producing an interpretation for "for 
example" will require similar techniques to those used in interpreting quantifiers. We 
will take this up again in Section |^. 

If we look at the comparable situation in discourse such as (^)-(^), where "for 
example" occurs to the right of a discourse connective, it can also be seen as abstracting 
the interpretation of its discourse-level matrix structure, with respect to the material to 
its right. 

(43) John just broke his arm. So, for example, he can't cycle to work now. 

(44) You shouldn't trust John because, for example, he never returns what he borrows. 



In (|43|), the connective "so" leads to 
result{a,6) 

being added to the discourse, where a is the interpretation of "John can't cycle to work 
now" , and 6 is the interpretation of "John just broke his arm" . "For example" then ab- 
stracts this relation with respect to the material to its right (i.e., a), thereby contributing: 

exemplify{a, AX . result(X, S)) 

That is, "John can't cycle to work" is an example of what results from "John breaking 
his arm" . Similarly, "because" in ( 44 ) leads to 

explanation{a ,S) 

being added to the discourse, where a is the interpretation of "he never returns what he 
borrows" , S is the interpretation of "you shouldn't trust John" , and "for example" adds 



14 The material to the right of "for example" can be any kind of constituent, including such strange 
ones as 

John gave, for example, a flower to a nurse. 

Here, "a flower to a nurse" would be an example of the set of object-recipient pairs within John's 
givings. Such non-standard constituents are also fonnd w ith coordination, which was one motivation 
for Combinatorial Categorial Grammar (Steedman, 1996). This just illustrates another case where 
such non-standard constituents are needeS^ 
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exemplify{a, AX . explanation(li,6)) 

i.e., that a is an example of the reasons for not trusting John. 

"For example" interacts with discourse adverbials in the same way: 

(45) Shall we go to the Lincoln Memorial? Then, for example, we can go to the White 
House. 

(46) As a money manager and a grass-roots environmentalist, I was very disappointed 
to read in the premiere issue of Garbage that The Wall Street Journal uses 
220,000 metric tons of newsprint each year, but that only 1.4% of it comes 
from recycled paper. By contrast, the Los Angeles Times, for example, uses 83% 
recycled paper. [WSJ, from Penn TrecBank /02/wsj-0269] 

In Example ^ the resolved discourse adverbial "then" leads to after{a,5) being added to 
the discourse context, where a is the interpretation of "we can go to the White House" , 
5 is the interpretation of "we can go to the Lincoln Memorial" , and "for example" adds 

exemplify{a, AX . afterQi,S)) 

i.e., that a is an example of the events that [can] follow going to the Lincoln Memorial. 
(N.B. As already noted, we are being fairly fast and loose regarding tense and modality, 
in the interests of focussing on the types of interactions.) 

In Example the resolved discourse anaphor "by contrast" contributes contrast{a ,S) , 
where a is the interpretation of "the LA Times using 83% recycled paper" and d is the 
intepretation of "only 1.4% of it [newsprint used by the WSJ] comes from recycle paper". 
"For example" then contributes 

exemplifyia, AX . contrastQi.,S)) 

i.e., that cr is one example of contrasts with the WSJ's minimal use of recycled paper. 

What occurs with discourse connectives and adverbials can also occur with relations 
added through adjacency-triggered defeasible inference, as in 

(47) You shouldn't trust John. For example, he never returns what he borrows. 
explanation{S ,a) 

exemplifyia, AX . explanation(6 ,X)) 
Here, as in (^), the relation provided by adjacency-triggered inference is R= explanation, 
which is then used by "for example" . 

But what about the many cases where only exemplify seems present, as in 

(48) In some respects they [hypertext books] are clearly superior to normal books, 
for example they have database cross-referencing facilities ordinary volumes lack. 
[British National Corpus, CBX 1087] 

(49) He [James Bellows] and his successor, Mary Anne Dolan, restored respect for the 
editorial product, and though in recent years the paper had been limping along 
on limited resources, its accomplishments were notable. For example, the Herald 
consistently beat its much-larger rival on disclosures about Los Angeles Mayor 
Tom Bradley's financial dealings. 

There are at least two explanations: One is that "for example" simply provides direct 
non-defeasible evidence for exemplify, which is the only relation that holds. The other 
explanation follows the same pattern as the examples given above, but with no further 
relation than elahoration{a ,5) . That is, we understand in ( p8| ) that "having database 
cross-referencing facilities" elaborates the respects in which hypertext books are superior 
to normal books, while in (p9|), we understand that "the Herald [newspaper] consistently 
beating its much- larger rival" elaborates the claim that "its accomplements were notable" . 
This elaboration relation is then abstracted (in response to "for example") to produce: 
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exemplify(a, AX . elaboration(K, S)) 

i.e., that this is one example of many possible elaborations. Because this is more specific 
than elaboration and seems to mean the same as exemplify(a ,S) , one might simply take 
it to be the only relation that holds. Given that so many naturally-occuring instances 
of "for example" occur with elaboration, it is probably useful to persist with the above 
shorthand. But it shouldn't obscure the regular pattern that appears to hold. 

Before going on to Case 4, we should comment on an ambiguity associated with "for 
example" . When "for example" occurs after an NP, PP or clause that can be interpreted 
as a general concept or a set, it can contribute a relation between the general concept/set 
and an instance, rather than being parasitic on another relation. For example, in: 

(50) In the case of the managed funds they will be denominated in a leading currency, 
for example US dollar, . . . [BNC CBX 1590] 

"for example" relates the general concept denoted by "a leading currency" to a specific 
instance, US dollars. (In "British" English, the BNC shows that most such examples 
occur with "such as" - i.e., in the construction "such as for example". This paraphrase 
does not work with the predicate-abstracting "for example" that is of primary concern 
here, such as in Example ^ .) 

But "for example" occurring after an NP, PP or clause can, alternatively, contribute 
a more subtle parasitic relationship to the previous clause, as in 

(51) All the children are ill, so Andrew, for example, can't help out in the shop. 



This differs from both and (50). That is, one cannot paraphrase ( pl[ ) as ( |52| ) as in 
( p3| ) where "for example" follows "so" : 

(52) All the children arc ill, so for example Andrew can't help out in the shop. 
( |52| ) simply specifies an example consequence of all the children being ill, as does 

(53) All the children are ill, so for example one of us has to be at home at times. 

In contrast, ( pT] ) specifies an example consequence for Andrew, as one of the children. 
Support for this comes from the fact that in (|5^), Andrew doesn't have to be one of the 
children: he could be their nanny or child minder, now stuck with dealing with alot of 
sick kids. But (^l|) is not felicitous if Andrew is not one of the children. 

We suspect here the involvement of Information Structure ( Stcedman, 2000a| ) : While 



the interpretation conveyed by "for example" is parasitic on the adjacency relation {result 
in Example pT| ) , its position after the NP "Andrew" in ( pl| ) may indicate a contrastive 
theme with respect to the previous clause, according to which Andrew in contrast to the 
other children suffers this particular consequence. But more work needs to be done on 
this to gain a full understanding of what is going on. 

Case 4: Ra is a defeasible rule that incorporates R. Case 4 occurs with discourse 
adverbials that carry the same presupposition as the discourse connectives "although" 



and the concessive sense of "while" (Lagerwerf, 1998). Case 4 shares one feature with 



Case 1, in that the discourse relation R conveyed by a structural connective or inferred 
from adjacency holds between a (the interpretation of the adverbial's matrix clause) 
and 5 (the interpretation of the left-adjacent discourse unit). Where it differs is that 
the result is then incorporated into the presupposition of the discourse adverbial. This 



presupposition, according to Lagerwerf (1998 ) , has the nature of a presupposed (or con- 



ventionally implicated) defeasible rule that fails to hold in the current situation. He gives 
as an example 

(54) Although Greta Garbo was called the yardstick of beauty, she never married. 
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This asserts both that Greta Garbo was called the yardstick of beauty and that she never 
married. The first implies that Greta Garbo is beautiful. The example also presupposes 
that, in general, if a woman is beautiful, she will marry. If such a presupposition can be 
accommodated, it will simply be added to the discourse context. If not, the hearer will 
find the utterance confusing or possibly even insulting. 

We argue here that the same thing happens with the discourse adverbials "neverthe- 
less" and "though". The difference is that, with discourse adverbials, the antecedent to 
the rule derives anaphorically from the previous discourse, while the consequent derives 
from the adverbial's matrix clause. (With the conjunctions "although" and concessive 
"while", both arguments are provided structurally.) 

Here we first illustrate Case 4 with two examples in which "nevertheless" occurs in 
the main clause of a sentence containing a preposed subordinate clause. The subordinate 
conjunction helps clarify the relation between the clauses that forms the basis for the 
presupposed defeasible rule. After this, we give a further example where the relation 
between the adjacent clauses comes through inference. 

(55) While John is discussing politics, he is nevertheless thinking about his fish. 
In (|5^), the conjunction "while" conveys a temporal relation R between the two clauses 
it connects 

during{e2, ei), where ei:discuss(john, politics) and e2:think_about(john,fish) 

What "nevertheless" contributes to ( |55|) is a defeasible rule based on this relation, which 
we will write informally as 

during{X ,E) A _E:discuss(y, politics)) > -iX:think_about(F,fish)) 
Normally, whatever one does during the time one is discussing politics, 
it is not thinking about one 's fish. 



This rule uses Asher and Morreau's ( 1991 ) defeasible implication operator (>) and ab- 
stracts over the individual (John), which seems appropriate for the general statement 
conveyed by the present tense of the utterance. 
Similarly, in 

(56) Even after John has had three glasses of wine, he is nevertheless able to solve 
difficult math problems, 
the conjunction "after" contributes a relation between the two clauses it connects 

after{e2, ei), where ei:drink(john,wine) and e2:solve(john,hard_problems) 

What "nevertheless" contributes to this example is a defeasible rule that we will again 
write informally as 

aft,er{X,E) A i?:drink(y,wine)) > -iX:solve(F,hard_problems)) 
Normally, whatever one is able to do after one has had three glasses of 
wine, it is not solving difficult algebra problems. 

Again, we have abstracted over the individual, as the presupposed defeasible rule associ- 
ated with the present tense sentence appears to be more general than a statement about 
a particular individual. 

On the other hand, in the following example illustrating a presupposed defeasible rule 
and a discourse relation associated with adjacency, it seems possible for the presupposed 
defeasible rule to be about John himself. 



15 We speculate that the reason such examples such as ( p5[ ) and (pd) sound more natural with the 
focus particle "even" applied to the subordinate clause, is that^ven" conveys an even greater 
likelihood that the defeasible rules holds, so "nevertheless" emphasises its failure to do so. 



25 



Computational Linguistics 



Volume 16, Number 1 



(57) John is discussing politics. Nevertheless, he is thinking about his fish. 

Here the discourse relation between the two clauses, each of which denotes a specific 
event, is 

during(e2, ei), where ei:discuss(john, politics) and e2:think_about(john,fish) 

(N.B. Our LF representation isn't sufliciently rich to express the difference between ( p5[ ) 
and ([57|).) What "nevertheless" contributes here is the presupposed defeasible rule 

during{X,ei) > -^X — e-i 

Normally what occurs during John's discussing politics is not John think- 
ing about his fish. 



Lagerwerf (1995 ) does not discuss how specific or general will be the presupposed de- 



feasible rule that is accommodated, nor what factors affect the choice. Kruijff-Korbayova 



anc Webber (2001a) also punt on the question, when considering the effect of Informa- 
tion Structure on what presupposed defeasible rule is associated with "although" . Again, 
this seems to be a topic for future work. 



Summary 

We have indicated four ways in which we have found the relation associated with a 
discourse adverbial to interact with a relation R triggered by adjacency or conveyed by 
structural connectives or, in some cases, by another relational anaphor: 

1. CT separately serves as an argument to both Ra and R; 

2. Ra{a, Ci) is an argument of i?; 
3.i?Q. is parasitic on R; 

4.Ra is a defeasible rule that incorporates R. 

We do not know whether this list is exhaustive or whether a discourse adverbial 
always behaves the same way vis-a-vis other relations. Moreover, in the process of setting 
down the four cases we discuss, we have identified several problems that we have not 
addressed, on which further work is needed. Still, we hope that we have convinced the 
reader of our main thesis - that by recognizing discourse adverbials as doing something 
different from simply signalling the discourse relation between adjacent discourse units 
and by considering their contribution as relations in their own right, one can begin to 
characterise different ways in which anaphoric and structural relations may themselves 
interact. 



4 Lexicalised Grammar for Discourse Syntax and Semantics 

The question we consider in this section is how the treatment we have presented of dis- 
course adverbials and structural connectives can be incorporated into a general approach 
to discourse interpretation. There are three possible ways. 

The first possible way is to simply incorporate our treatment of adverbials and con- 
nectives into a sentence-level grammar, since such grammars already cover the syntax 
of sentence- level conjunction (both coordinate and subordinate) and the syntax of ad- 
verbials of all types. The problem with this is that sentence-level grammars - whether 
phrasal or lexicalized - stop at explicit sentence-level conjunction and do not provide any 
mechanism for forming the meaning of multi-clausal units that cross sentence-level punc- 
tuation. Moreover, as we have already seen in Section 3, the interpretation of discourse 
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Seg SPunct Seg | Seg SPunct | SPunct | 

on the one hand Seg on the other hand Seg | 
not only Seg hut also Seg 

SPunct := S Punctuation 

Punctuation . | ; | : | ? | ! 

S := S Coord S | S Subord S | Subord S S | Sadv S | 
NP Sadv VP I S Sadv 

Coord := and \ or \ hut \ so 

Subord :— although \ after \ hecause \ hefore \ ... 

Sadv := DAdv | SimpleAdv 

DAdv := instead \ otherwise \ for example \ meanwhile \ ... 
SimpleAdv := yesterday \ today \ surprisingly \ hopefully \ . 

Figure 6 

PS rules for a discourse grammar 



adverbials can interact with the imphcit relation between adjacent sentences, as weh as 
with an exphcitly signalled relation, so that a syntax and compositional semantics that 
stops at the sentence will not provide all the structures and associated semantics needed 
to build the structures and interpretations of interest. 

The second possibility is to have a completely different approach to discourse-level 
syntax and semantics than to sentence-level syntax and semantics, combining (for ex- 
ample) a Definite Clause Grammar with Rhetorical Structure Theory. But as we and 
others have already noted, this requires discourse semantics reaching further and further 
into sentence-level syntax and semantics to handle relations between main and embedded 
clauses, and between embedded clauses themselves, as in Example |5^. 

(58) If they're drunk and they're meant to be on parade and you go to their room 
and they're lying in a pool of piss, then you lock them up for a day. 
[The Independent, 17 June 1997] 

Thus it becomes harder and harder to distinguish the scope of discourse-level syntax and 
semantics from that at the sentence-level. 

The third possibility is to recognize the overlapping scope and similar mechanisms 
and simply extend a sentence-level grammar and its associated semantic mechanisms to 
discourse. Its additional responsibilities would be to account for the formation of larger 
units of discourse from smaller units; the projection of discourse unit interpretation onto 
the interpretation of the larger discourse units they participate in; and the effect of 
discourse unit interpretation on the evolving discourse model. There are two styles of 
grammar one could use for this - (a) a phrase-structure grammar (PSG), which is what 
Polanyi and van den Berg (199^ ) use for discourse, or (b) a lexicalized grammar that 
extends to discourse, a sentence-level lexicalized grammar such as Tree- Adjoining Gram- 
mar (Joshi, 1987; KTAG-Group, 2001) or Combinatory Categorial Grammar (CCG) 
( ^teedman, 1996|^teedman, 2000b| )! 

The latter is what we argue for, even though TAG and CCG are weakly context- 
sensitive (CS) and the power needed for a discourse grammar with no crossing depen- 
dencies is only CF (Section 1.1). Our argument is based on our desire to use a discourse 
grammar in Natural Language Generation (NLG). It is well-known that context-free 
PSGs (CF PSGs) set up a complex search space for NLG. A discourse grammar speci- 
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fied in terms of phrase structure rules such as those shown in Figure ^ doesn't provide 
sufficient guidance when reversed to use in generating discourse. For example, one might 
end up having to guess randomly how many sentences and connectives one had, in what 
order, before being able to fill the sentences and connectives in with any content. More 
generally, trying to generate exactly a given semantics when semantics underspecifies 
syntactic dependency (as discourse semantics must, on our account) is known to be in- 
tractable (KoUer and Striegnitz, 200S). An effective solution is to generate semantics and 
syntax simultaneously, which is straightforward with a lexicalized grammar ( ^tone et al., 
200(l|). 

Given the importance of various types of inference in discourse understanding, there 
is a second argument for using a lexicalized discourse grammar, which derives from the 
role of implicature in discourse. Gricean reasoning about implicatures requires a hearer 
be able to infer the meaningful alternatives that a speaker had in composing a sentence. 
With lexicalization, these alternatives can be given by a grammar, allowing the hearer, 
for example, to ask sensible questions like "Why did the speaker say 'instead' here instead 
of nothing at all?" and draw implicatures from this. A CF PSG, on the other hand, might 
suggest questions like "Why did the speaker say two sentences rather than one here?", 
which seem empirically not to lead to any real implicatures. (On the contrast between 
choices, which seem to lead to implicatures, and mere alternative linguistic formulations, 
which do not seem to, see for ex ample (Dale and Reiter, 1995 ; Levinson, 2000 ). 



In several previou s papers ( Webber, Knott, and Joshi, 2001 ; Webber et al., 1999a ; 
Webber et al., 1999b ), we described how our approach fits into the framewo rk of Tree 
Adjoining Gramar. This has led to the initial version of a discourse parser (Forbes et 
al., 2001) in which the same parser that builds trees for individual clauses using clause- 
level LTAG trees, then combines them using discourse-level LTAG trees. Here we simply 
outline the grammar, called DLTAG (Section 4.1), and then show how it supports the ap- 



proach to structural and anaphoric discourse connectives presented earlier (Section 4.2). 

(Of course, one still needs to account for how speakers realise their intentions through 
text and how what is achieved through a single unit of text contributes to what a speaker 
hopes to achieve through any larger unit it is embedded in. Preliminary accounts are 
given in (Grosz and Sidner, 1990; Moser and Moore, 1996). However, given the com- 



plex relation between individual sentences and speaker intentions, it is unlikely that the 
relation between multi-sentence discourse and speaker intentions can be modelled in a 
straightforward way similar to the basically monotonic compositional process that we 
have discussed in this paper for discourse semantics.) 



4.1 DLTAG and Discourse Syntax 

A lexicalized TAG begins with the notion of a lexical anchor, which can have one or more 
associated tree structures. For example, the verb likes anchors one tree corresponding to 
John likes apples, another corresponding to the topicalized Apples John likes, a third 
corresponding to the passive Apples are liked by John, and others as well. That is, there 
is a tree for each minimal syntactic construction in which likes can appear, all sharing 
the same predicate- argument structure. This syntactic/semantic encapsulation is possible 
because of the extended domain of locality of LTAG. 

A lexicalized TAG contains two kinds of elementary trees: initial trees that refiect 
basic functor-argument dependencies and auxiliary trees that introduce recursion and 
allow elementary trees to be modified and/or elaborated. Unlike the wide variety of 
trees needed at the clause level, we have found that extending a lexicalized TAG to 
discourse only requires a few elementary tree structures, possibly because clause-level 
syntax exploits structural variation in ways that discourse doesn't. 
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a: subconj_mid 




i subconj i 



a: subconj_pre 



subconj 




(a) (b) 

Figure 7 

Initial trees (a-b) for a subordinate conjunction. Dc stands for "discourse clause", | indicates a 
substitution site, while "subconj" stands for the particular subordinate conjunction that 
anchors the tree. 



29 



Computational Linguistics 



Volume 16, Number 1 




On die ^ On the i 
one hand other 

Figure 8 

An initial tree for parallel constructions. This particular one is for a contrastive construction 
anchored by "on the one hand" and "on the other hand". 



4.1.1 Initial Trees DLTAG has initial trees associated with subordinate conjunctions, 
with paraUel constructions, and with some coordinate conjuctions. We describe each in 
turn. 



In the large LTAG developed by the XTAG project ( [XTAG-Group, 20011) , subordi- 
nate clauses are seen as adjuncts to sentences or verb phrases - i.e., as auxiliary trees - 
because they are outside the domain of locality of the verb. In DLTAG, however, it is 
predicates on clausal arguments (such as coordinate and subordinate conjunctions) that 
define the domain of locality. Thus, at this level, these predicates anchor initial trees 
into which clauses substitute as arguments. Figure |^ shows the initial trees for postposed 
subordinate clauses (a) and preposed subordinate clauses (b).|^ At both leaves and root 
is a discourse clause (Dc) - a clause or a structure composed of discourse clauses. 

One reason for taking something to be an initial tree is that its local dependencies 
can be stretched long-distance. At the sentence-level, the dependency between apples and 
likes in apples John likes is localized in all the trees for likes. This dependency can be 
stretched long-distance, as in Apples, Bill thinks John may like. In discourse, as we noted 
in Section 0, local dependencies can be stretched long-distance as well - as in 

(59) a. Although John is generous, he's hard to find. 

b. Although John is generous - for example, he gives money to anyone who asks 
him for it - he's hard to find. 



(60) a. On the one hand, John is generous. On the other hand, he's hard to find. 

b. On the one hand, John is generous. For example, suppose you needed some 
money: You'd only have to ask him for it. On the other hand, he's hard to 
find. 

Thus DLTAG also contains initial trees for parallel construction s as in (|60|). Such an 



initial tree is shown in Figure^. Like some initial trees in XTAG (XTAG-Group, 2001), 
such trees can have a pair of anchors. Since there are different ways in which discourse 
units can be parallel, we assume a different initial tree for contrast ("on the one hand"... 
"on the other (hand)"...), disjunction ("either"... "or"...), addition ("not only"... "but 
also"...), and concession ("admittedly"... "but"...). 

Finally, there are initial trees for structural connectives between adjacent sentences or 
clauses that convey a particular relation between the connected units. One clear example 
is "so" , conveying result. Its initial tree is shown in Figure ^. We will have a better sense 



16 While in an earlier paper ( Webber an^ Josty.^ 1998 ), we discuss reasons for taking the lexical 
= i,nchnrs nf th e initial trees in H ignres [/j and M to he feature structures, following the analysis in 



(Knott, 199C; Knott and Mellish, 199d|), here we just take them to be specific lexical items. 
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a: so 




Figure 9 

Initial tree for coordinate conjunction "so" . 
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(a) (b) (c) 

Figure 10 

Auxiliary trees for basic elaboration. These particular trees are anchored by (a) the 
punctuation mark "." and (b) "and". The symbol * indicates the foot node of the auxiliary 
tree, which has the same label as its root, (c) Auxiliary tree for the discourse adverbial "then". 



of what other connectives to treat as structural as a result of annotation efforts of the 
sort described in ( preswell et al., 2002D p| 

4.1.2 Auxiliary Trees DLTAG uses auxiliary trees in two ways: (a) for discourse units 
that continue a description in some way; and (b) for discourse adverbials. Again we 
describe each in turn. 

First, auxiliary trees anchored by punctuation (e.g. period, comma, semi-colon, etc.) 
(Figure [l0|a) or by simple coordination (Figure [l0|b) are used to provide further descrip- 
tion of a situation or of one or more entities (objects, events, situations, states, etc.) 
within the situation^ The additional information is conveyed by the discourse clause 
that fills its substitution site. Such auxiliary trees are used in the derivation of simple 
discourses such as: 

(64) a. John went to the zoo. 

b. He took his cell phone with him. 

Figure 11 shows the DLTAG derivation of Example starting from LTAG deriva- 



tions of the individual sentences.|^ To the left of the arrow (— ^) are the elementary trees 
to be combined: Tl stands for the LTAG tree for clause |64| a, T2 for clause |6^, and 
p-.punctl, for the auxiliary tree assocated with the full stop after (p^a). In the derivation, 
the foot node of (3:punctl is adjoined to the root of Tl and its substitution site filled by 
T2, resulting in the tree to the right of (A standard way of indicating TAG derivations 
is shown under — >, where dashed lines indicate adjunction, and solid lines, substitution. 



17 For example, one might also have initial trees for marked uses of "and" and "or" , that have a 
specific meaning beyond simple conjunction or disjunction as in 

(61) a. Throw another spit ball and you'll regret it. 
b. Eat your spinach or you won't get dessert. 

These differ from the more frequent, simple coordinate uses of "and" and "or" in that the seccpd 
conjimiit in these marked cases bears a discourse relation to the first conjunct {result in both (|Sl|a) 
1 (pip)). With simple coordinate uses of "and" and 



and 



all conjuncts (disjiiacts) bear the same 



relation to the same immediately left-adjacent discoursejjnit. For example, in (|62[), each conjunct is 
a separate explanation for not trusting John, while in (pSh, each disjunct conveys an alternative 
result of John's good fortune. 

(62) You shouldn't trust John. He never returns what he borrows, and he bad-mouths his associates 
behind their backs. 

(63) John just won the lottery. So he will quit his job, or he will at least stop w orking overtime. 
For simple coordinate uses of "and" and "or" , we have auxiliary trees (Secti] 

18 The latter use of an ; 
and entity chains in ( [Knott et al., 2001D . 

19 We comment on left-to-right mcremental constructio n of DLTAG structures in parallel with 
sentence-level LTAG structures at the end of Section 4.2. 



a.ted to dominant topic chaining in (Scha and Polanyi, 1985) 
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Figure 11 

TAG derivation of Example 
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Each line is labelled with the address of the argument at which the operation occurs. t1 
is the derivation tree for Tl, and r2, the derivation tree for T2.) 

The other auxiliary trees used in the lexicalised discourse grammar are those for 



discourse adverbials, which are simply auxiliary trees in a sentence-level LTAG (XTAG 



Grciup, 2001), but with an interpretation that projects up to the discourse level. An 
example is shown in Figure p^ . Adjoining such an adverbial to a clausal/sentential 
structure contributes to how information conveyed by that structure relates to the pre- 
vious discourse. 

There is some lexical ambiguity in this grammar, but no more than serious con- 
sideration of adverbials and conjunctions demands. First, as already noted, discourse 
adverbials have other uses that may not be anaphoric (|65|a-b) and may not be clausal 
(Ha-c): 

(65) a. John ate an apple instead of a pear. 

b. In contrast with Sue, Fred was tired. 

c. Mary was otherwise occupied. 

Secondly, many of the adverbials found in second position in parallel constructions (e.g., 
"on the other hand", "at the same time", "nevertheless") can also serve as simple ad- 
verbial discourse connectives on their own. In the first case, they will be one of the 
two anchors of an initial tree (Figure ^), while in the second, they will anchor a simple 
auxiliary tree (Figure p^) . These lexical ambiguities correlate with structural ambiguity. 

4.2 Example Derivations 

It should be clear by now that our approach aims to explain discourse semantics in terms 
of a product of the same three interpretive mechanisms that operate within clause-level 
semantics: 

• compositional rules on syntactic structure (here, discourse structure) 

• anaphor resolution 

• inference triggered by adjacency and structural connection. 

For the compositional part of semantic s in DLTAG (in particular, com puting interpre- 



tations on derivation trees), we follow loshi and Vijay-Shanker (2001 ). Roughly, they 



compute interpretations on the derivation tree by a bottom-up procedure. At each level, 
function-application is used to assemble the interpretation of the tree from the interpre- 
tation of its root node and its subtrees. Where multiple subtrees have function types, 
the interpretation procedure is potentially nondeterministic: The resulting ambiguities 
in interpretation may be admitted as genuine, or they may be eliminated by a lexical 
specification. Multi-component TAG tree-sets are used to provide an appropriate com- 
positional treatment for quantifiers, which we borrow for interpretating "for example" 
(Examples |6^c-d) . 

We show here rather informally how DLTAG and an interpretative process on its 
derivations operate. We start with previous examples (^) (here |6^c) and (|4^) (here, 
1) and two somewhat simpler variants (pqa-b): 



(66) a. You shouldn't trust John because he never returns what he borrows. 

b. You shouldn't trust John. He never returns what he borrows. 

c. You shouldn't trust John because, for example, he never returns what he 
borrows. 

d. You shouldn't trust John. For example, he never returns what he borrows. 
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Figure 12 

Derivation of Example nfla. The derivation tree is shown below the arrow, and the derived 
tree, to its right. (NodeTabels Dc have been omitted for simplicity.) 




Figure 13 

Derivation of Example 



This will allow us to show how (|66|a-b) and (|66|c-d) receive similar interpretations, despite 
having somewhat different derivations, and how the discourse adverbial "for example" 
contributes both syntactically and scmantically to those interpretations. 

We let Tl stand for the LTAG parse tree for "you shouldn't trust John", rl, its 
derivation tree, and interp(n), the eventuality associated with its interpretation. Simi- 
larly, we let T2 stand for the LTAG parse tree for "he never returns what he borrows" , 
t2, its derivation tree, and interp(T2), the eventuality associated with its interpretation. 

Example |6^ a involves an initial tree {a:because-mid) anchored by "because" (Fig- 
ure |lj). Its derived tree comes from Tl substituting at the left-hand substitution site of 
a:because-mid (index 1) and T2 at its right-hand substitution site (index 3). Composi- 
tional interpretation of the resulting derivation tree yields explanation(iiiteT'p{T2) ,hiteTp(Tl)) . 
(A more precise interpretation would distinguish between the direct and epistemic causal- 
ity senses of "because", but the derivation would proceed in the same way.) 

In contrast with ( |66|a ) , Example |66[ b employs an auxiliary tree {(3:punctl) anchored 
by full-stop "." (Figure]!^) . Its derived tree comes from T2 substituting at the right-hand 
substitution site (index 3) of f3:punctl, and (3:punctl adjoining at the root of Tl (index 0). 
Compositional interpretation of the derivation tree yields merely that T2 continues the 
description of the situation associated with Tl - i.e., eZa6oraizon(interp(r2),interp(Tl)). 
Further inference triggered by adjacency and structural connection leads to a conclusion 
of causality between them - i.e., ea;p/anatio?i(interp(T2),interp(n)), but this conclusion 
is defeasible because it can be denied without a contradiction - e.g. 

(67) You shouldn't trust John. He never returns what he borrows. But that's not why 
you shouldn't trust him. 

Example [6^ c differs from (|6^a) in containing "for example" in its second clause. As 
noted earlier, "for example" resembles a quantifier with respect to its semantics, as its 
interpretation takes wider scope than would be explained by its syntactic position. We 



handle this in the same way that quantifiers are handled in ( Joshi and Vijay-Shanker 
20Cl|l|), by associating with "for example" a two-element TAG tree-set (Figure y). Both 
trees in the tree-set participate in the derivation: the auxiliary tree P:for_exl adjoins 
at the root of T2, while the auxiliary tree j3:for_ex2 adjoins at the root of the higher 
discourse unit. Since we saw from Example |6^a that the interpretation of this higher 
discourse unit is ea;pZanation(interp(T2),interp(n)), the interpretation associated with 
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Figure 14 

Derivation of Example 
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for example 
Figure 15 

Derivation of Example 



the adjoined (3:for_ex2 node both embeds and abstracts this interpretation, yielding 

exemplification{mteTp{T2) , AX . explanation{X.,mteTp{Tl)) 

That is, John's never returning what he borrows is one instance of a set of explanations. 

Similarly, Example |66|d differ from (|66[b) in containing "for example" in its second 
sentence. As in Examplel60b, an inferred relation is triggered between the interpretations 
of T2 and Tl, namely ea;p^anafion(interp(r2),interp(Tl)). Then, as a result of (3:for_exl 
adjoining at T2 and P:for-ex2 adjoining at the root of the higher discourse unit, "for 
example" again contributes the interpretation 

exemplification(iiiteTp{T2) , AX . explanationQi.,iiiteTp{Tl)) 

Thus (|6^c) and ([6^d) only differ in the derivation of the interpretation that "for example" 
then abstracts over. 

The next example we will walk through is Example ^ (given here as Example |68|) . 

(68) John loves Barolo. So he ordered three cases of the '97. But he had to cancel the 
order because then he discovered he was broke. 

As shown in Figure this example involves two initial trees (a:so, a:because-mid) for the 
structural connectives "so" and "because" ; an auxiliary tree for the structural connective 
"but" (P'.but), since "but" functions as a simple conjunction to continue the description 
of the situation under discussion; an auxiliary tree (P'.then) for the discourse adverbial 
"then"; and initial trees for the four individual clauses Tl-TA. As can be seen from the 
derivation tree, Tl and T2 substitute into a:so as its first and third arguments, and fi'.but 
root-adjoins to the result. The substitution argument of j3:hut is filled by a:because-mid, 
with rS and T4 substituted in as its first and third arguments, and (}:then is root-adjoined 
to T4. The interpretation contributed by "then" , after its anaphoric argument is resolved 
to interp(r2), is 

t4: a/ter(interp(r4), interp(T2)). 

The interpretations derived compositionally from the structural connectives "so", "be- 
cause" and "but" are: 

il: resu/t(interp(T2), interp(n)) 

i2: explanationim.iev'piT A) ^ interp(S'3)) 

i3: elaboration(L2,Ll) 

Further inference may then refine elaboration to contrast, based on how but is being used. 
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Figure 17 

Derivation of Example 



^ because 
T2 TB 



Finally, we want to point out one more way in which texts that seem to be close para- 
phrases get their interpretations in different ways. Consider the two texts in Example p9|: 



(69) a. You should eliminate part2 before partS because part2 is more susceptible to 
damage. 

b. You should eliminate part2 before partS. This is because part2 is more sus- 
ceptible to damage. 



Example |69|b is a simpler version of an example in ( Moser and Moore, 1995| ), where 
"This is because" is treated as an unanalyzed cue phrase, no different from "because" in 
(|69|a). We show here that this isn't necessary: One can analyze (|69[b) using compositional 
semantics and anaphor resolution, and achieve the same results. 

First consider (|69|a). Given the interpretations of its two component clauses, its 
overall interpretation follows in the same way as (p6|a), shown in Figure 12, Now consider 



(|69|b) and the derivation shown in Figure |T^. Here the initial tree a:because-mid has its 
two arguments filled by T2, the TAG analysis of "this is" and TB, the TAG analysis of 
"part 2 is more susceptible to damage". The overall derived tree for (|69|b) comes from 
(3:punctl root-adjoining to Tl (the TAG analysis of "You should eliminate part2 before 
parts"), with the subsitution site of f3:punctl filled by the a:because-mid derivation. 
The compositional interpretation of the derivation tree yields the interpretation of the 
a:because-mid tree (il) as an elaboration of the interpretation of Tl: 

il: explanation(iTiteTp{T B) ,mteTp{T2)) 
i2: elaboration{il,mteTp(Tl)) 

But this is not all. The pronoun "this" in T2 is resolved anaphorically to the nearest 



consistent eventuality (Eckert and Strube, 2000: Byron, 2002), which in this case is 
interp(Tl). Taking this as the interpretation of T2 and substituting, we get 



i 1 : explanation(iTiteTp{T B) ,interp (Tl ) ) 
i2: e^a6oration(il,interp(Tl)) 



Notice that il is also the interpretation of (69i). To this, i2 adds the somewhat redundant 
information that il serves to elaborate the advice in Tl. Thus (|6^a) and (|69[b) receive 
similar interpretations but by different means. This treatment has the added advantage 
that one does not have to treat "This is not because" as a separate cue phrase. Rather, 
negation simply produces 
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il: --'explanation{mieTp(T B) ,mtcrp(Tl)) 
i2: elaboration{il,uiteTp{Tl)) 

That is, Tl is elaborated by a denial of a (possible) explanation. Presumably, the text 
would go on to provide the actual explanation. 

Finally, we want to comment on the holy grail of discourse parsing: running it in par- 
allel with incremental sentence-level parsing. Neither the analyses given in this section. 



nor the discourse parser described in (Forbes et al., 2001) run in parallel with incre- 



mental sentence-level parsing. But we believe that an approach grounded in a lexicalized 
grammar holds more promise for parallel, incremental sentence-discourse processing than 
either an approach that uses distinct mechanisms for the two, or an approach that uses 
phrase-structure rules for both. 

An approach to sentence-discourse processing that was both incremental and parallel 
would minimally require the following: 

• A left-to-right parser for the lexicalized grammar that would simultaneously 
compute increments to both sentence-level syntactic structure, sentence-level 
semantics, discourse-level syntactic structure and discourse-level semantics. 
Increments to the latter two would only occur at clause boundaries and with 
discourse adverbials and structural connectives. 



An in cremental anaphor resolution mechanism, simil ar to that in (Strube 



1998), but ex tended both to deictic pronouns, as in ( Eckert and Strube, 2000 



Byron, 2002 ), and to the anaphoric argument of discourse adverbials. 



• Incremental computation of discourse structure in terms of elaboration relations 
and further non-defeasible reasoning to more specific relations, where possible. 

An left-to-right parser that simultaneously produces sentence-level syntactic and se- 



mantic analyses already exists for combinatory categorial grammar (Steedman, 1996; 
^teed man, 2000b|; [Hockenmaier, Bierner, and Baldridge, To appear)) , and it would seem 



straight-forward to extend such a parser to computing discourse-level syntax and se- 
mantics as well. Similarly, it seems straight-forward to produce an incremental version 
of any of the current generation of anaphor resolution mechanisms, extended to deic- 
tic pronouns, although current approaches only attempt to resolve "this" and "that" 
with the interpretation of a single clause ~ not with that of any larger discourse unit. As 
these approaches are also not very accurate as yet, incremental anaphor resolution awaits 
improvements to anaphor resolution in general. Moreover, as we better understand the 
specific anaphoric properties of discourse adverbials through empirical analysis such as 
( ICreswell et al., 2002| ), such anaphor resolution mechanisms can be extended to include 
them as well. 

As for building discourse structure incrementally in parallel with syntactic structure, 
there is no working prototype yet that will do what is needed, but we have no doubt 
that better understanding of semantics and researchers' reliable ingenuity will eventually 
succeed here as well. 



5 Conclusion 



In this paper, we have argued that discourse adverbials make an anaphoric, rather than 
a structural, connection with the previous discourse (Section |^) , and we have provided 
a general view of anaphora in which it makes sense to talk of discourse adverbials as 
being anaphoric (Section ||). We have then shown that this view of discourse adverbials 
allows us to characterize a range of ways in which the relation contributed by a discourse 
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adverbial can interact with the relation conveyed by a structural connective or inferred 
through adjacency (Section ||), and then shown how discourse syntax and semantics can 
be treated as an extension of sentence- level syntax and semantics, using a lexicalised 
discourse grammar (Section ^) . 

We are clearly not the first to have proposed a grammatical treatment of low-level 



aspects of discourse semantics (|Asher and Lascarides, 1999; ( 


[Jardent, 1997 




Polanyi and 


van 


den Berg, 1996 




Scha and Polanyi, 1985 ; 3childer, 1997; 


3; Schilder, 1997b; van den 



Beig, 1996). But we are the first to have recognised that a key to avoiding problems of 
maintaining a compositional semantics for discourse lies in recognizing discourse adver- 
bials as anaphors and not trying to shoe-horn everything into a single class of discourse 
connectives. While we are not yet able to propose a solution to the problem of correctly 
resolving discourse adverbials or a way of achieving the holy grail of computing discourse 
syntax and semantics in parallel with incremental sentence processing, the proposed ap- 
proach does simplify issues of discourse structure and discourse semantics in ways that 
have not before been possible. 
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